Experiments on speech tracking in audio documents using Gaussian mixture modeling
نویسندگان
چکیده
This paper deals with the tracking of speech segments in audio documents. We use a cepstral-based acoustic analysis and gaussian mixture models for the representation of the training data. Three ways of scoring an audio document based on a frame-level likelihood calculation are proposed and compared. Our experiments are done on a database composed of television programs including news reports, advertisements, and documentaries. The best equal error rate obtained is approximately 12%.
منابع مشابه
Robust text-independent speaker identification using Gaussian mixture speaker models
This paper introduces and motivates the use of Gaussian mixture models (CMM) for robust text-independent speaker identification. The individual Gaussian components of a GMM are shown to represent some general speaker-dependent spectral shapes that are efTective for modeling speaker identity. The focus of this work is on applications which require high identification rates using short utterance ...
متن کاملPhoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain
This article presents a new feature extraction technique based on the temporal tracking of clusters in spectro-temporal features space. In the proposed method, auditory cortical outputs were clustered. The attributes of speech clusters were extracted as secondary features. However, the shape and position of speech clusters change during the time. The clusters temporally tracked and temporal tra...
متن کاملMouth Region Localization Method Based on Gaussian Mixture Model
This paper presents a new mouth region localization method which uses the Gaussian mixture model (GMM) of feature vectors extracted from mouth region images. The discrete cosine transformation (DCT) and principle component analysis (PCA) based feature vectors are evaluated in mouth localization experiments. The new method is suitable for audio-visual speech recognition. This paper also introduc...
متن کاملSpeech Enhancement using Laplacian Mixture Model under Signal Presence Uncertainty
In this paper an estimator for speech enhancement based on Laplacian Mixture Model has been proposed. The proposed method, estimates the complex DFT coefficients of clean speech from noisy speech using the MMSE estimator, when the clean speech DFT coefficients are supposed mixture of Laplacians and the DFT coefficients of noise are assumed zero-mean Gaussian distribution. Furthermore, the MMS...
متن کاملEfficient likelihood computation in multi-stream HMM based audio-visual speech recognition
Multi-stream hidden Markov models have recently been introduced in the field of automatic speech recognition as an alternative to single-stream modeling of sequences of speech informative features. In particular, they have been very successful in audio-visual speech recognition, where features extracted from video of the speaker’s lips are also available. However, in contrast to single-stream m...
متن کامل